Aim

At the end of this Worksheet, you should be able to:

  • use the “Box Model” and the Hypothesis Testing framework to perform a 1 sample Z or T Test.
  • use R Output from t.test to perform a 1 or 2 sample T Test.


Recap on the 1 and 2 sample T Tests

  • The structure of a hypothesis test is H A T P C.

  • See Lecture and Lecture.

  • For a box model contructed around a certain null hypothesis, the observed value of the test statistic is (OV - EV)/SE, where the OV is calculated from the sample and the EV and SE are calculated from the box model.

  • For a 1 Sample Z or T Test, the null hypothesis is about the population mean, so we consider the Sample Mean, and the test statistic has a Z or T distribution respectively, depending on whether we know the population SD.

Test Test Statistic P-value Curve
Z \(\frac{\mbox{observed mean - population mean}}{\mbox{population SD}/\sqrt{n}}\) Normal
T \(\frac{\mbox{observed mean - population mean}}{\mbox{sample SD}/\sqrt{n}}\) \(t_{n-1}\)
  • Most commonly, we use the T Test. The easiest way is to use t.test() in R, which calculates the test statistics and p-values.


When you finish: Upload your answers to the Lab12 link on Ed.



1 Have a go

Explain the structure of a T Test using an annotated diagram, and example.









2 Caffeine and endurance (1 Sample Z Test)

A study considered caffeine effect’s on endurance, with a double blind, random order administration of caffeine capsules for 9 elite cylists.

The following data is the time to exhaustion after 0 and 13 mg caffeine per kg body weight, with cafdiff representing the effect of caffeine on endurance.

Assume that we know the population SD is 12 mins, from a previous study.

caf0 = c(36.05, 52.47, 56.55, 45.2, 35.25, 66.38, 40.57, 57.15, 28.34)  # no caffeine (base-line endurance)
caf13 = c(37.55, 59.3, 79.12, 58.33, 70.54, 69.47, 46.48, 66.35, 36.2)  # 13 mg caffeine per kg body weight
cafdiff = caf13- caf0  # This represents the 'effect' of caffeine on endurance.
cafdiff
## [1]  1.50  6.83 22.57 13.13 35.29  3.09  5.91  9.20  7.86

Test the claim that caffeine does not affect endurance - ie the mean cafdiff is 0.

Note:

  • We define “endurance” to be “time to exhausation” here.
  • This is a 1 sample Z test, as we are using the 12 values in cafdiff, and we know the population SD = 12.

2.1 Preparation

Draw a simple box model, with any population and sample details identified.











2.2 Hypothesis Test

H

  • Write down the null and alternative hypotheses.

Ho: The mean effect of caffeine on endurance (time to exhausation) is 0.

H1: The mean effect of caffeine on endurance is not 0.

  • Add the null hypothesis to the box model above, with the specific mean of the box. As we have the population SD, we can use the Z Test.

A

Write down any assumptions.

T

  • For a sample of size 12 from the box, what is the expected value (EV) and standard error (SE) of the Mean of the sample?
n = 9
ev = 0    # from Ho in box model
se = 12/sqrt(n)



  • What is the formula for the test statistic?

  • What is the observed value of the test statistic?

ov = mean(cafdiff)
ts = (ov-ev)/se
ts
## [1] 2.927222

P

Using the Normal curve curve to model the Mean of the sample, what is the approximate p-value?

2*pnorm(ts,lower.tail=F)
## [1] 0.003420044



C

What is the conclusion?



3 Caffeine and endurance (1 Sample T Test)

A study considered caffeine effect’s on endurance, with a double blind, random order administration of caffeine capsules for 9 elite cylists.

The following data is the time to exhaustion after 0 and 13 mg caffeine per kg body weight, with cafdiff representing the effect of caffeine on endurance.

caf0 = c(36.05, 52.47, 56.55, 45.2, 35.25, 66.38, 40.57, 57.15, 28.34)  # no caffeine (base-line endurance)
caf13 = c(37.55, 59.3, 79.12, 58.33, 70.54, 69.47, 46.48, 66.35, 36.2)  # 13 mg caffeine per kg body weight
cafdiff = caf13- caf0  # This represents the 'effect' of caffeine on endurance.
cafdiff
## [1]  1.50  6.83 22.57 13.13 35.29  3.09  5.91  9.20  7.86

Test the claim that caffeine does not affect endurance - ie the mean cafdiff is 0.

Note: Now, we do not know the population SD, so we will perform a 1 sample T Test.

3.1 Preparation

Draw a simple box model, with any population and sample details identified.











3.2 Hypothesis Test

H

  • Write down the null and alternative hypotheses.

Ho: The mean effect of caffeine on endurance (time to exhausation) is 0.

H1: The mean effect of caffeine on endurance is not 0.

  • Add the null hypothesis to the box model above, with the specific mean of the box. As we do not have the population SD, we will use the T Test.

A

Write down any assumptions.

T

  • For a sample of size 12 from the box, what is the expected value (EV) and standard error (SE) of the Mean of the sample?
n = 9
ev = 0    # from Ho in box model
se = sd(cafdiff)/sqrt(n)



  • What is the formula for the test statistic?

  • What is the observed value of the test statistic?

ov = mean(cafdiff)
ts = (ov-ev)/se
ts
## [1] 3.252508

P

Using a \(t_{11}\) curve curve to model the Mean of the sample, what is the approximate p-value?

2*pt(ts,n-1, lower.tail=F)
## [1] 0.01165724



C

What is the conclusion?



The speedy way to do all this is:

t.test(mu = 0, cafdiff)
## 
##  One Sample t-test
## 
## data:  cafdiff
## t = 3.2525, df = 8, p-value = 0.01166
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   3.407372 20.010405
## sample estimates:
## mean of x 
##  11.70889





4 Caffeine and endurance (2 Sample T Test)

Consider the following data on heart rates (beats per minute), for 2 independent groups of Sydney students, collected 20 minutes after the ‘RedBull’ group had drunk a 250ml cold can of Red Bull.

No_RB = 84,76,68,80,64,62,74,84,68,96,80,64,65,66

RB = 72,88,72,88,76,75,84,80,60,96,80,84

Test the claim that Redbull (caffeine) has an effect on heart rate

Note: - We are comparing 2 independent populations, so we will use a 2 sample T Test. - You can use the information given in t.test.

No_RB = c(84,76,68,80,64,62,74,84,68,96,80,64,65,66)
RB = c(72,88,72,88,76,75,84,80,60,96,80,84)
t.test(No_RB, RB, var.equal = T)
## 
##  Two Sample t-test
## 
## data:  No_RB and RB
## t = -1.5418, df = 24, p-value = 0.1362
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.892538   2.011586
## sample estimates:
## mean of x mean of y 
##  73.64286  79.58333

4.1 Preparation

Draw a simple box model with the 2 populations and 2 samples, with any population and sample details identified.











4.2 Hypothesis Test

H

  • Write down the null and alternative hypotheses.

Ho: The mean difference between the heart rates of the 2 populations (with and without Red Bull) is 0.

H1: The mean difference between the heart rates of the 2 populations (with and without Red Bull) is not 0.

  • Add the null hypothesis to the box model above, with the specific mean difference of the boxes.

A

Write down any assumptions.

T

From the R Output, what is the observed value of the test statistic?

P

From the R Output, what is the p-value?



C

What is the conclusion?



4.3 Diagnostic checks for the assumptions

A1. The 2 samples are independent

Given in context.


A2. The 2 populations have equal spread (SD/variance)

## Loading required package: ggplot2

var.test(No_RB,RB)
## 
##  F test to compare two variances
## 
## data:  No_RB and RB
## F = 1.1357, num df = 13, denom df = 11, p-value = 0.8428
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.334832 3.631266
## sample estimates:
## ratio of variances 
##           1.135659


A3. The 2 populations are Normal

Boxplots look fairly symmetric.

require(ggplot2)
p3 = ggplot(RB_data, aes(sample = rate, colour = group)) +  
  stat_qq() + stat_qq_line() + ggtitle("QQplot")
p3
shapiro.test(No_RB)
## 
##  Shapiro-Wilk normality test
## 
## data:  No_RB
## W = 0.90604, p-value = 0.138
shapiro.test(RB)
## 
##  Shapiro-Wilk normality test
## 
## data:  RB
## W = 0.97459, p-value = 0.9524